As computing systems grow to exascale levels of performance, the smallest elements of a single processor can greatly affect the entire computer system (e.g. its power consumption). As future generations of processors are developed, simulation at the gate level is necessary to ensure that the necessary target performance benchmarks are met prior to fabrication. The most common simulation tools available today utilize either a single node or small clusters and as such create a bottleneck in the development process. This paper focuses on the massively parallel simulation of logic gate circuit models using supercomputer systems. The focus of this performance study leverages the OpenSPARC T2 processor design and the PHDOLD benchmark model using Rensselaer s Optimistic Simulation System (ROSS). We conduct simulations of the crossbar component on a 24-core SMP machine, an IBM Blue Gene/L, and an IBM Blue Gene/Q. Using a single SMP core as the baseline, our performance experiments on 1024 cores of the Blue Gene/L demonstrate more than 131-times faster execution of the OpenSPARC model. We also present the performance results of ROSS executing the Time Warp synchronization protocol using up to 7.8M MPI tasks on 1,966,080 cores of the Sequoia Blue Gene/Q supercomputer system. For the PHOLD benchmark model, we demonstrate the ability to process 33 trillion events in 65 seconds yielding a peak event-rate in excess of 504 billion events/second using 120 racks of Sequoia.